IRLE IRLE WORKING PAPER # 107 - 14 April 2014

نویسندگان

  • Aaron Chatterji
  • Rodolphe Durand
  • David Levine
  • Samuel Touboul
چکیده

Raters of corporations play an important role in assessing domains ranging from sustainability to corporate governance to best workplaces. Scholars increasingly rely on these ratings to test theories about corporate social responsibility (CSR), corporate governance and the influence of stakeholders. Though these raters frequently develop sophisticated methodologies, we find they often diverge in their ratings of the same firm, creating uncertainty for managers and stakeholders, and also posing challenges for researchers. We document the surprising lack of convergence of social ratings for the first time using six well-established socially responsible investing (SRI) raters, with comparisons of overlap, correlations, and regression analysis. Our results suggest that scholars should interpret empirical results with caution and at least use multiple ratings schemes in studies of CSR and governance. Do Ratings of Firms Converge? Implications for Strategy Research 2 In 2010, professional fund managers in the U.S. invested more than $3 trillion under the banner of socially responsible investing (SRI). The enormous amount of capital allocated to SRI has drawn considerable attention from scholars, activists, managers, and policymakers interested in the drivers of corporate social responsibility (CSR). Some CSR advocates praise SRI, believing that it can direct capital toward the most responsible firms while penalizing firms with poor social performance. Skeptics argue that the organizations that rate the social performance of enterprises, referred to as “raters” or “SRI raters” in our study, cannot discern which firms are socially responsible. For example, Hawken (2004) points out that the various methodologies employed by socially responsible raters allow for almost any public firm to be a member of at least one SRI index. Entine (2003) presents several examples of raters giving high marks to firms that were later embroiled in famous scandals. Delmas, Etzion and Nair Birch (2013) show that different raters may use different methods to measure firms’ environmental performance. Academics have produced dozens of articles on CSR and SRI over the past two decades, with growing interest in recent years (Orlitzky, Schmidt, & Rynes, 2003). For example, from 1994-2008, seven articles published in SMJ relied on KLD data. From 2009 to 2013, 19 articles used KLD and 6 articles employed FTSE4Good, Innovest, DJSI or Asset4. Notably, influential research has examined the effects of SRI on returns for investors and the cost of capital for managers (Galema, Plantinga, & Scholtens, 2008; Waddock, 2003). Other research examines the drivers of CSR, such as profitmaximizing responses to heterogeneous consumer preferences (Mackey, Mackey, & Barney, 2007), imitation among firms, or a departure from profit-maximizing behavior to satisfy managers’ private goals (Marquis, Glynn, & Davis, 2007; Devinney, 2009). 1 Social Investment Forum Foundation, Report on Socially Responsible Investing Trends in the United States, 2010. According to this source, as of 2010, socially responsible investments are nearly 12.2% of the total funds managed by professional investors. This percentage has grown markedly since 2005, where $2 trillion, or 10% of total funds, were invested in accordance with socially responsible guidelines. 2 We use the term “raters” or “SRI raters” to refer organizations that assess corporate social responsibility. Do Ratings of Firms Converge? Implications for Strategy Research 3 A key question of this study is whether raters converge to valid assessments of firms’ social activities and performance. Despite growing interest in CSR, little research examines whether raters measure CSR accurately (Sharfman,1996; Delmas et al, 2013). If these metrics are invalid or are inconsistently applied across raters, scholars who conduct analysis using one rating scheme risk drawing conclusions that are not generalizable. Lack of convergence among raters would also pose significant challenges for practice. Socially responsible firms seeking to improve their CSR should be able to understand whether poor ratings are due to poor results, a different conceptualization of CSR than the raters, or poor measurement methods (Margolis & Walsh, 2003; Gray, 2010). Furthermore, if ratings cannot consistently identify socially responsible firms, the hypothesized benefits of SRI cannot occur. In the worst-case scenario, if firms expend resources to achieve high scores on invalid metrics, then even well-intended attention to social metrics reduces social welfare. Thus, it is crucial, both from the academic and practical perspective, to understand the validity of social ratings and the dynamics driving convergence across raters. In this paper, we first document that the ratings of six major social raters—KLD, Asset4, Calvert, FTSE4Good, DJSI, and Innovest—have little overlap in membership and fairly low correlations with each other. Our results imply that SRI raters not only do not agree on a one definition of sustainability (their “theorization” of CSR or what they measure), but also that raters may measure the same construct in different ways (the “commensurability” of CSR dimensions or how raters measure the same indicators). The validity of social ratings is a serious concern not just for academics, but also for investors, activists, and policymakers. Our findings suggest scholars should interpret prior empirical studies using CSR ratings with appropriate caution and at the very least, replicate studies using alternative ratings schemes. 3 When discussing the behavior of raters, we use the term “convergence.” When referring to the rating they provide, we use the term “convergent validity.” Do Ratings of Firms Converge? Implications for Strategy Research 4 APPROACHING CONVERGENCE While we have broad agreement in the field on how to measure financial performance, assessing social performance is inherently more challenging. The literature on social evaluations of firms and organizations establishes that two mechanisms drive convergence. First, “theorization” makes clear precisely what raters assess and why it matters (Durand, Rao, & Monin, 2007; Hsu, Roberts, & Swaminathan, 2012). Next, “commensurability” of indicators makes comparison across evaluated organizations possible (Espeland & Sauder, 2007; Sauder & Espeland, 2009). “Theorization”, according to Rao et al. (2003), is the conceptual discourse produced by a rater (e.g. Michelin in haute cuisine, US News in higher education) that associates actions to outcomes and allows organizations to expect (1) better rankings from changes in behavior and (2) the accompanying benefits from these changes, such as more customers. When there is a clear theorization, organizations can adjust their behaviors—or choose not to. We use the term “theorization” to refer to the beliefs raters may have about what being socially responsible means. A “common theorization” refers to agreement across raters on a common definition of CSR; for example about dimensions of social investors should care about (e.g. environmental, social, and corporate governance), or about industries that social investors should consider as inherently irresponsible (e.g. nuclear energy, weapons, tobacco). “Commensurability” of a construct is high when different raters measure the same construct in a similar fashion. For instance, in financial ratings, measurements and interpretation of the construct “debt/equity ratio” are similar across various rating agencies. We use the term “commensurability” to refer to the extent that raters are using the same (or at least similar) measures and methods to assess the same construct (e.g. employee safety or independent board). Simply put, common theorization among SRI raters is overlap in what raters choose to measure, and commensurability is overlap in how they measure corporate social responsibility. In any given Do Ratings of Firms Converge? Implications for Strategy Research 5 domain, raters are more likely to converge around valid measures when the raters share a same theory of what good performance means (“theorization”) and what indicators are valid proxies for that good performance (“commensurability”). Common theorization When evaluating the extent of common theorization across SRI raters, there are at least three aspects of measurement to consider. First, what high-level categories (e.g., environmental, social, governance) do the raters measure? Second, do the raters screen out particular industries such as tobacco and firearms? Third, do raters normalize their ratings by industry such that a firm is compared to the other firms in its own industry? In terms of high-level categories, there is broad agreement on the components of social responsibility. Rhetorically, the marketing materials of the raters we study all seem fairly similar in describing their goals. For example, one of FTSE4Good’s stated goals is “to provide investors with the opportunity to gain exposure to companies that meet globally recognized corporate responsibility standards.” KLD asserts that its “research is designed for investors and money managers who integrate environmental, social and governance factors into their investment process.” Calvert describes its ratings as “a broad-based, rigorously constructed benchmark for measuring the performance of large, US based companies following sustainable and responsible policies...”, and Asset4 claims to “provide objective, relevant and systematic environmental, social and governance information” that “professional investors use to define a wide range of responsible investment strategies.” In addition, all of the indexes cover similar high-level topics, including environmental and social performance. 4 While our empirical analysis utilizes data from 2002-2010, we have tried to provide more recent information where possible, including: FTSE4Good Index Series http://www.ftse.com/Indices/FTSE4Good_Index_Series/ Downloads/ Brochure _english.pdf (Last accessed March 1st, 2012); KLD’s Research Products http://www.kld.com/research/index.html (Last accessed August 13th, 2007); Calvert-About the Ratings http://www.calvert.com/sri-index.html (Last accessed March 1st, 2012); Asset4 ESG content overview http://thomsonreuters.com/products_services/financial/content_news/ content_overview/content_az/content_esg/ (Last accessed February 8th, 2012). Do Ratings of Firms Converge? Implications for Strategy Research 6 However, there are some key differences across the raters. Some raters consider additional highlevel categories. For example, KLD and Asset4 rate firms according to their products’ safety, while other raters do not. Asset4 and DJSI explicitly consider economic dimensions, while other raters do not. KLD, Asset4, FTSE4Good and Innovest consider Corporate Governance as part of CSR while Calvert and DJSI do not. Interestingly, the geographic origin of the rater appears to have some influence on their theorization of CSR. As an example, KLD, a U.S. rater, has 71% of its subcategories in the social issues domain. KLD therefore puts more weight on social issues than Asset4, a European rater, which has only 47% of its subcategories related to social issues. In other domains, such as in issues relating to employees, Asset4 appears to place more emphasis as compared to KLD. While both Asset4 and KLD consider employee diversity, the firm’s impact on local communities and its respect of human rights, Asset4 clearly differentiates between employees’ health and safety, training programs, and labor relations. KLD includes all of those topics under the broad umbrella of “employment”. Further differences in theorization appear when considering the use of screens for particular industries. Three of the six raters (KLD, Calvert, and FTSE4Good) use explicit screens to exclude firms with substantial investments in categories like tobacco and firearms, though they each define “substantial” differently. Even among this group, FTSE4Good and KLD screen out firms involved in nuclear power, while Calvert does not. Finally, four of the six raters normalize their ratings by industries (KLD and Asset4 are the exceptions). These raters assert that CSR performance must be measured relative to industry peers (see Table 1) Insert Table 1 about here 5 Community, Governance, Diversity, Employment, Environment, Human Rights, Product. 6 Function of the board of directors, Structure of the board of directors, Compensation of the board of directors, Vision and strategy, Shareholders, Emission reduction, Product Innovation, Resource Reduction, Product Responsibility, Community, Human Rights, Diversity, Employment Quality, Health and safety, Training and development Do Ratings of Firms Converge? Implications for Strategy Research 7 The upshot is that despite similar language there do appear to be differences in the way various raters envision CSR and which firms should be evaluated in the first place. Commensurability Low convergent validity due to lack of common theorization is still consistent with high validity of raters, if each of them is trying to measure a different definition of “good CSR.” For example, it is not a critique of either rater if the list of “100 best cheap eats” and “100 best fine dining” do not overlap, as each has a different theory of what diners are looking for. Similarly, users of social ratings may differ in what dimensions of CSR they value (Crilly, Zollo, & Hansen, 2012; Delmas & Toffel, 2008; Philippe & Durand, 2011). Some investors may wish to avoid profiting from activities they feel are harmful, leading them to desire screens based on whether a firm sells certain products. Other investors may wish to encourage high effort by managers, leading them to focus on ratings that are defined relative to an industry, not an absolute scale. In that case, low correlations across social ratings could still be consistent with valid measurement by each rater, because raters would be simply appealing to different groups. However low convergent validity will still be present in the case of low commensurability across raters, or when ratings of the same construct disagree due to differences in measurement. Thus if we adjust for different theorizations (what constructs raters measure), the convergent validity of ratings will be determined by differences in commensurability (how raters measure the same constructs). Commensurability is inherently a serious challenge for SRI raters. For example, it is unclear exactly how to measure superior human resource management, or which indicators to use to measure higher-than-average toxic releases. Similarly, raters must quantify information that is difficult to measure, such as the social impact of additional minority representation on the board of directors, or the social impact of having business interests in a nation that is ruled by totalitarian regime. Raters make a significant effort to persuade potential investors that their methods and ratings Do Ratings of Firms Converge? Implications for Strategy Research 8 are based on careful analysis of high-quality data (Chatterji, Levine, & Toffel, 2009). The implication is that they measure the indicated constructs with high validity. For example, all of the social raters claim they draw on multiple sources and use multiple research methods, both of which are established scientific approaches: They all review official government data (e.g., on toxic emissions and regulatory actions), explore company documents and press reports, and conduct interviews. Our research confirms that all the raters (except Asset4) also do surveys, though they employ different methodologies. All of these raters’ have marketing materials that stress how carefully they analyze companies’ social, governance, and environmental records. They often compare themselves to traditional financial research firms. For example, KLD describes its services as “analogous to those provided by financial research service firms.” Not coincidently, Dow Jones and the Financial Times (Creators of DJSI and FTSE4Good) and Thomson-Reuters (owner of Asset4) are also well-known providers of traditional financial information. Nevertheless, raters use different methods and variables to measure the same construct. Some raters measure environmental performance with indicators of a firm’s environmental processes, while others will concentrate on the firm’s environmental outcomes (Delmas et al., 2013). For example, raters such as KLD give credit for products with beneficial impact on the environment, while others, like FTSE4Good, employ metrics that assess the procedures to identify and fix environmental hazards, in the spirit of the ISO 14001 management standards. In general, these differences in commensurability are difficult for investors to observe. In sum, there are two possibilities regarding convergent validity of SRI ratings after adjusting for theorization. If commensurability is high, adjusting for different theorizations should substantially increase convergent validity. For example, if all raters measure environmental performance in the same way, convergent validity should be high. Alternatively, it is possible that the raters may themselves be uncertain about how to accurately measure each dimension of social responsibility. Do Ratings of Firms Converge? Implications for Strategy Research 9 Hence, we might expect that even after adjusting for differences in theorization, convergent validity will remain low. In this case, if convergent validity is low for a pair of raters rating the same constructs, at least one of the raters has low validity as well. Below, we perform these tests to assess the convergence of SRI raters. DATA To test the convergence of SRI raters, we examine the ratings of a common universe of companies from six leading social raters: KLD, Asset4, Innovest, DJSI, FTSE4Good and Calvert. Taken together, these raters and ratings are among the most popular and well established in the field. These data cover the 2002–2010 period for KLD and Asset4. For the other raters we have selected years: 2004 for DJSI, 2005 for Calvert and Innovest, and 2006 for FTSE4Good. In all instances, we compare ratings provided in the same year, unless otherwise noted. Our dataset provides a global view of the industry, with KLD, Calvert, and Dow Jones based in the U.S., Innovest in Canada, while FTSE4Good and Asset4 have origins in the European Union. The raters have broadly similar processes to develop ratings. They collect raw quantitative and qualitative data on specific information (production of tobacco based products, CO2 emissions, election of tradeunion representatives, etc.). The raters then implement proprietary methodologies to issue scores on high-level categories such as environmental impact, human rights compliance, and governance. Finally, raters typically provide a list of companies they consider most responsible, most often in an equity index for potential investors. To assemble the data, we started with each rater’s index of socially responsible companies and the broader universe of company stocks from which the index list was selected (S&P500, Russell 1000). Our first task was to denote the firms that had been included on each rater’s index of top 7 SustainAbility report, Rate the Raters Phase Two, Taking Inventory of the Ratings Universe, 2010. This report lists all of these raters, except for Calvert, among their top 16 raters in terms of credibility. Note that KLD purchased Innovest at the time of this report. We included Calvert since it is regarded as one of the oldest and most well-known raters in this space 8 FTSE4Good is based in the UK, while Asset4 is in Switzerland. Do Ratings of Firms Converge? Implications for Strategy Research 10 social investments. Thus, we assigned a “1” to firms included in the KLD Domini 400 Social Index, the Calvert Social Index, the FTSE4Good Index, the DJSI World Index, Innovest’s 18 U.S.-based firms in its “Top 100 Leaders in Sustainability,” and Asset4 firms which received an A+ grade. We assigned a “0” to firms in the eligible universe but not in these indexes. In sum, we obtained membership data for 3134 firms from six different indexes’ universes. The universe common to all raters includes 551 firms in 2004, 413 in 2005 and 538 in 2006, and is most comparable to the S&P 500. Table A1 in the Appendix summarizes the raters’ universes. In addition to membership, we collected more detailed data for all firms rated by KLD and Asset4 between 2002 and 2010, and for some firms rated by Calvert and Innovest in 2005, and by DJSI in 2004. For KLD, we had 98 detailed sub-scores, which rated each company on more specific aspects of their environmental and social performance. The KLD sub-scores consist of 1/0 indicators for a strength or a concern on topics such as waste recycling, involvement in military products, and emissions of ozone-depleting gases. Those strengths and concerns are grouped in 7 categories (Environment, Community, etc.). We used these sub-scores in two different ways. First we computed the sum of strengths minus the sum of concerns per category. Second, we estimated KLD category scores with the predictions from of a logit model that considered membership to KLD DS400 as a binary dependent variable, and KLD strengths and concerns per category as independent variables. We refer to this second measure of KLD scores as “the probability of inclusion in DS400”. For Asset4 we accessed scores for the four high-level categories and corresponding 18 sub-scores. 9 Community, Diversity, Employment, Corporate Governance, Environment, Human Rights, Products. 10 Economic (Economic Performance, Shareholders’ Loyalty, Clients Loyalty), Governance (Board Functions, Board Structure, Compensation Policy, Vision and Strategy, Shareholder Rights), Environment (Emission Reduction, Product Innovation, Resource Reduction), Social (Product Responsibility, Community, Human Rights, Diversity and Opportunity, Employment Quality, Health & Safety, Training and Development) Do Ratings of Firms Converge? Implications for Strategy Research 11 We had fewer details on other raters’ sub-scores. For Calvert, we had five high-level scores, but only for the 100 largest firms they rate. For DJSI, we had scores for its three high-level categories and for 78 firms which represented the within-industry top 10% of firms plus one “runner-up” per industry. Innovest computes its index by first issuing each firm a numerical score, which is then normalized per industry to become a letter grade (AAA down to CCC). This letter grade is used as an indication of index membership. We had access to Innovest’s letter grades for each firm in their universe and for three high-level categories (Social, Environment, and Governance). We transformed those grades into a 1 to 7 score for our analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theoretical study of cellobiose hydrolysis to glucose in ionic liquids

http://dx.doi.org/10.1016/j.cplett.2014.04.014 0009-2614/ 2014 Elsevier B.V. All rights reserved. ⇑ Corresponding authors at: Department of Chemistry, Graduate School of Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8602, Japan. E-mail addresses: [email protected] (D. Yokogawa), sirle@chem. nagoya-u.ac.jp (S. Irle). Yoshifumi Nishimura , Daisuke Yokogawa a,c,⇑, Steph...

متن کامل

Level-crossing Ordering of Semi-markov Processes and Markov Chains

We extend the definition of level-crossing ordering of stochastic processes, proposed by Irle andGani (2001), to the case in which the times to exceed levels are compared using an arbitrary stochastic order, and work, in particular, with integral stochastic orders closed for convolution. Using a sample-path approach, we establish level-crossing ordering results for the case in which the slower ...

متن کامل

IRLE IRLE WORKING PAPER # 40 - 92 May 1992

What ethical considerations are made in decisions on distribution of scarce health care resources? Are the distributional decisions consciously based on explicit ethical principles, and if so, what principles? Both theoretical and political arguments on distribution of scarce medical resources frequently make simultaneous considerations of costs, probability of positive effects of medical treat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014